Multi-view Acoustic Feature Learning Using Articulatory Measurements

نویسندگان

  • Sujeeth Bharadwaj
  • Raman Arora
  • Karen Livescu
  • Mark Hasegawa-Johnson
چکیده

We consider the problem of learning a linear transformation of acoustic feature vectors for phonetic frame classification, in a setting where articulatory measurements are available at training time. We use the acoustic and articulatory data together in a multi-view learning approach, in particular using canonical correlation analysis to learn linear transformations of the acoustic features that are maximally correlated with the articulatory data. We also investigate simple approaches for combining information shared across the acoustic and articulatory views with information that is private to the acoustic view. We apply these methods to phonetic frame classification on data drawn from the University of Wisconsin Xray Microbeam Database. We find a small but consistent advantage to the multi-view approaches combining shared and private information, compared to the baseline acoustic features or unsupervised dimensionality reduction using principal components analysis.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Kernel CCA for multi-view learning of acoustic features using articulatory measurements

We consider the problem of learning transformations of acoustic feature vectors for phonetic frame classification, in a multi-view setting where articulatory measurements are available at training time but not at test time. Canonical correlation analysis (CCA) has previously been used to learn linear transformations of the acoustic features that are maximally correlated with articulatory measur...

متن کامل

Acoustic feature learning using cross-domain articulatory measurements

Previous work has shown that it is possible to improve speech recognition by learning acoustic features from paired acoustic-articulatory data, for example by using canonical correlation analysis (CCA) or its deep extensions. One limitation of this prior work is that the learned feature models are difficult to port to new datasets or domains, and articulatory data is not available for most spee...

متن کامل

Articulatory information and Multiview Features for Large Vocabulary Continuous Speech Recognition

This paper explores the use of multi-view features and their discriminative transforms in a convolutional deep neural network (CNN) architecture for a continuous large vocabulary speech recognition task. Mel-filterbank energies and perceptually motivated forced damped oscillator coefficient (DOC) features are used after feature-space maximum-likelihood linear regression (fMLLR) transforms, whic...

متن کامل

Combining acoustic and articulatory feature information for robust speech recognition

The idea of using articulatory representations for automatic speech recognition (ASR) continues to attract much attention in the speech community. Representations which are grouped under the label ‘‘articulatory’’ include articulatory parameters derived by means of acoustic-articulatory transformations (inverse filtering), direct physical measurements or classification scores for pseudo-articul...

متن کامل

Multiview Representation Learning via Deep CCA for Silent Speech Recognition

Silent speech recognition (SSR) converts non-audio information such as articulatory (tongue and lip) movements to text. Articulatory movements generally have less information than acoustic features for speech recognition, and therefore, the performance of SSR may be limited. Multiview representation learning, which can learn better representations by analyzing multiple information sources simul...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2012